Using the uniqueness of global identifiers to determine the provenance of Python software source code
نویسندگان
چکیده
We consider the problem of identifying provenance free/open source software (FOSS) and specifically need where reused code has been copied from. propose a lightweight approach to solve based on identifiers-such as names variables, classes, functions chosen by programmers. The proposed is able efficiently narrow down small set candidate origin products, be further analyzed with more expensive techniques make final determination.By analyzing PyPI (Python Packaging Index) open ecosystem we find that globally defined identifiers are very distinct. Across PyPI's 244 K packages found 11.2 M different global (classes method/function names-with only 0.6% shared among two types entities); 76% were used in one package, 93% at most 3. Randomly selecting 3 non-frequent from an input product enough its origins maximum products within 89% cases.We validate mapping Debian implemented Python corresponding packages; this uses five trials, each trial three randomly python file subject then ranks results using popularity index requires inspect top result. In our experiments, method effective finding true project recall 0.9 precision 0.77.
منابع مشابه
the use of appropriate madm model for ranking the vendors of mci equipments using fuzzy approach
abstract nowadays, the science of decision making has been paid to more attention due to the complexity of the problems of suppliers selection. as known, one of the efficient tools in economic and human resources development is the extension of communication networks in developing countries. so, the proper selection of suppliers of tc equipments is of concern very much. in this study, a ...
15 صفحه اولassessment of the efficiency of s.p.g.c refineries using network dea
data envelopment analysis (dea) is a powerful tool for measuring relative efficiency of organizational units referred to as decision making units (dmus). in most cases dmus have network structures with internal linking activities. traditional dea models, however, consider dmus as black boxes with no regard to their linking activities and therefore do not provide decision makers with the reasons...
the effect of using model essays on the develpment of writing proficiency of iranina pre-intermediate efl learners
abstract the present study was conducted to investigate the effect of using model essays on the development of writing proficiency of iranian pre-intermediate efl learners. to fulfill the purpose of the study, 55 pre- intermediate learners of parsa language institute were chosen by means of administering proficiency test. based on the results of the pretest, two matched groups, one as the expe...
surveying the relevance of proportions to the content of quran verses
چکیده : قرآن چشمه سار زلال هدایتی است که از سوی خداوند حکیم نازل شده تا بشر را به سر منزل کمال برساند. و در این راستا از شیوه های گوناگون بیانی خطابی و بلاغی استفاده کرده تا با فطرت زیبا طلب انسان درآمیزد و اورا مقهور خویش ساخته، به سوی کمالات سوق دهد.ازجمله جنبه های بارز اعجاز بیانی قرآن وجود فواصل در پایان آیات است که کار برد سجع و قافیه در کلام بشر شبیه آن است. برخی ازعلمای سلف تفاوت هایی ب...
15 صفحه اولthe effect of using visual aids on the development of speech act of disagreement among iranian intermediate efl learners
abstract the present study tried to investigate the effect of visual aids (films) on the development of the speech act of disagreement among iranian efl intermediate learners. to this end, the researcher selected 40 homogeneous intermediate learners based on their scores on oxford placement test. .the subjects then divided into control group and experimental group. both classes were tested by ...
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Empirical Software Engineering
سال: 2023
ISSN: ['1382-3256', '1573-7616']
DOI: https://doi.org/10.1007/s10664-023-10317-8